[shardformer] merge shardformer to main #4152

FrankLeeeee · 2023-07-04T05:50:03Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

N/A

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

This PR releases the shardformer feature to the main branch.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

* init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example

) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example * add share weight and train example * add train * add docstring and readme * add docstring for other files * pre-commit

* [shardformer] refactored the user api * polish code

* update readme with modules content * remove img

) * add dropout layer, add dropout test * modify seed manager as context manager * add a copy of col_nn.layer * add dist_crossentropy loss; separate module test * polish the code * fix dist crossentropy loss

…3883) * add gpt2 policy and modify shard and slicer to support * remove unused code * polish code

* add bert align test, fix dist loss bug * forward and backward align * add ignore index * add shardformer CI * add gather_output optional for user in shardconfig * update readme with optional gather_ouput * add dist crossentropy loss test, remove unused files * remove unused file * remove unused file * rename the file * polish code

* fix bug in slicer, add slicer unit test * add dropout test * use pid as dropout seed * updata dropout test with local pattern * ad todo

#3949) * add dist dropout in model * update docstring and bert policy with dropout * refactor basepolicy and sharded, update bert * update format * update gpt2 policy * update bert policy * remove unused code * update readme for new policy usage

adjust layer attr

test t5

* add dist dropout in model * update docstring and bert policy with dropout * refactor basepolicy and sharded, update bert * update format * update gpt2 policy * update bert policy * remove unused code * update readme for new policy usage * add downstream model of bert * remove unused code

* fix an error in readme * simplify code

* fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review

* [shardformer] integrated linear 1D with dtensor * polish code

) * [shardformer] refactored embedding and dropout to parallel module * polish code

* fix bert downstream with new api * remove comment line

* add gpt2 test and layer class refactor * add dropout in gpt2 policy

* [shardformer] adapted T5 and LLaMa test to use kit * polish code

* support kit use for bert test * support kit test for gpt2

* [shardformer] support module saving and loading * polish code

* add linearconv1d test * add linearconv1d test

* add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm

* [test] fixed tests failed due to dtensor change * polish code

* [shardformer] shardformer support opt models * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix

* first v of vit shardformer * keep vit * update * vit shard add vitattention vitlayer * update num head shard para * finish test for vit * add new_model_class & postprocess * add vit readme * delete old files & fix the conflict * fix sth

* [shardformer] add benchmark of shardformer * [shardformer] add benchmark of shardformer

* [shardformer] refactored some doc and api * polish code

* [shardformer] made tensor parallelism configurable * polish code

applications/Chat/coati/trainer/.sft.py.swp

colossalai/shardformer/examples/shardformer_benchmark.py

github-actions · 2023-07-04T06:56:54Z

The code coverage for the changed files is 84%.

Click me to view the complete report

Name                                                                                       Stmts   Miss  Cover
--------------------------------------------------------------------------------------------------------------
colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py                           164     82    50%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/matmul_strategy_generator.py     388     76    80%
colossalai/auto_parallel/tensor_shard/utils/misc.py                                           45      9    80%
colossalai/checkpoint_io/utils.py                                                            243     43    82%
colossalai/device/device_mesh.py                                                             178     14    92%
colossalai/lazy/lazy_init.py                                                                 299     40    87%
colossalai/nn/layer/base_layer.py                                                             36     15    58%
colossalai/nn/layer/parallel_1d/_operation.py                                                 53     26    51%
colossalai/shardformer/__init__.py                                                             1      0   100%
colossalai/shardformer/_utils.py                                                              42     15    64%
colossalai/shardformer/layer/__init__.py                                                       7      0   100%
colossalai/shardformer/layer/_operation.py                                                   152     51    66%
colossalai/shardformer/layer/dropout.py                                                       35      0   100%
colossalai/shardformer/layer/embedding.py                                                    118      5    96%
colossalai/shardformer/layer/linear.py                                                       156     23    85%
colossalai/shardformer/layer/loss.py                                                          49      8    84%
colossalai/shardformer/layer/normalization.py                                                 50     10    80%
colossalai/shardformer/layer/parallel_module.py                                               76     21    72%
colossalai/shardformer/layer/qkv_fused_linear.py                                             196     29    85%
colossalai/shardformer/layer/utils.py                                                         81     10    88%
colossalai/shardformer/modeling/__init__.py                                                    0      0   100%
colossalai/shardformer/modeling/bloom.py                                                      29      5    83%
colossalai/shardformer/policies/__init__.py                                                    0      0   100%
colossalai/shardformer/policies/autopolicy.py                                                 27      2    93%
colossalai/shardformer/policies/basepolicy.py                                                 50      5    90%
colossalai/shardformer/policies/bert.py                                                      109      2    98%
colossalai/shardformer/policies/bloom.py                                                      60      3    95%
colossalai/shardformer/policies/gpt2.py                                                       68      2    97%
colossalai/shardformer/policies/llama.py                                                      43      2    95%
colossalai/shardformer/policies/opt.py                                                        50      2    96%
colossalai/shardformer/policies/t5.py                                                         74      2    97%
colossalai/shardformer/policies/vit.py                                                        26     26     0%
colossalai/shardformer/shard/__init__.py                                                       4      0   100%
colossalai/shardformer/shard/shard_config.py                                                  21      2    90%
colossalai/shardformer/shard/sharder.py                                                       66      5    92%
colossalai/shardformer/shard/shardformer.py                                                   13      0   100%
colossalai/tensor/comm_spec.py                                                               253     93    63%
colossalai/tensor/d_tensor/__init__.py                                                         4      0   100%
colossalai/tensor/d_tensor/api.py                                                            136     18    87%
colossalai/tensor/d_tensor/comm_spec.py                                                      151     35    77%
colossalai/tensor/d_tensor/layout.py                                                          38      1    97%
colossalai/tensor/d_tensor/layout_converter.py                                               195     12    94%
colossalai/tensor/d_tensor/utils.py                                                           38      7    82%
colossalai/tensor/shape_consistency.py                                                       294    120    59%
colossalai/tensor/sharding_spec.py                                                           139     13    91%
colossalai/testing/__init__.py                                                                 4      0   100%
colossalai/testing/comparison.py                                                              54      9    83%
tests/kit/model_zoo/registry.py                                                               18      0   100%
tests/kit/model_zoo/transformers/__init__.py                                                   7      0   100%
tests/kit/model_zoo/transformers/bert.py                                                      42      0   100%
tests/kit/model_zoo/transformers/bloom.py                                                     34      0   100%
tests/kit/model_zoo/transformers/gpt.py                                                       28      0   100%
tests/kit/model_zoo/transformers/llama.py                                                     26      2    92%
tests/kit/model_zoo/transformers/opt.py                                                       32      0   100%
tests/kit/model_zoo/transformers/t5.py                                                        24      0   100%
tests/test_autochunk/test_autochunk_diffuser/test_autochunk_unet.py                           36     10    72%
tests/test_booster/test_mixed_precision/test_fp16_torch.py                                    30      1    97%
tests/test_booster/test_plugin/test_gemini_plugin.py                                          74     10    86%
tests/test_booster/test_plugin/test_low_level_zero_plugin.py                                  60      6    90%
tests/test_booster/test_plugin/test_torch_ddp_plugin.py                                       78      0   100%
tests/test_booster/test_plugin/test_torch_fsdp_plugin.py                                      43      0   100%
tests/test_checkpoint_io/test_gemini_checkpoint_io.py                                         80      0   100%
tests/test_device/test_device_mesh.py                                                         58     36    38%
tests/test_device/test_init_logical_pg.py                                                     27      1    96%
tests/test_fx/test_tracer/test_hf_model/hf_tracer_utils.py                                    21      2    90%
tests/test_fx/test_tracer/test_hf_model/test_hf_albert.py                                     17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_bert.py                                       15      1    93%
tests/test_fx/test_tracer/test_hf_model/test_hf_diffuser.py                                   50     28    44%
tests/test_fx/test_tracer/test_hf_model/test_hf_gpt.py                                        17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_opt.py                                        15      1    93%
tests/test_fx/test_tracer/test_hf_model/test_hf_t5.py                                         17      1    94%
tests/test_fx/test_tracer/test_timm_model/test_timm_model.py                                  36     24    33%
tests/test_fx/test_tracer/test_torchaudio_model/test_torchaudio_model.py                      14      5    64%
tests/test_fx/test_tracer/test_torchrec_model/test_deepfm_model.py                            39      3    92%
tests/test_fx/test_tracer/test_torchrec_model/test_dlrm_model.py                              41      4    90%
tests/test_fx/test_tracer/test_torchvision_model/test_torchvision_model.py                    31      1    97%
tests/test_lazy/lazy_init_utils.py                                                            72     14    81%
tests/test_lazy/test_distribute.py                                                            73      3    96%
tests/test_lazy/test_models.py                                                                13      1    92%
tests/test_shardformer/__init__.py                                                             0      0   100%
tests/test_shardformer/test_layer/test_dist_crossentropy.py                                   27      1    96%
tests/test_shardformer/test_layer/test_dropout.py                                             42      1    98%
tests/test_shardformer/test_layer/test_embedding.py                                           30      1    97%
tests/test_shardformer/test_layer/test_layernorm.py                                           27      1    96%
tests/test_shardformer/test_layer/test_linear_1d.py                                           85      1    99%
tests/test_shardformer/test_layer/test_qkv_fused_linear_1d.py                                 73      1    99%
tests/test_shardformer/test_layer/test_vocab_parallel_embedding_1d.py                         32      1    97%
tests/test_shardformer/test_model/__init__.py                                                  0      0   100%
tests/test_shardformer/test_model/_utils.py                                                   21      0   100%
tests/test_shardformer/test_model/test_shard_bert.py                                          56      1    98%
tests/test_shardformer/test_model/test_shard_bloom.py                                         56      1    98%
tests/test_shardformer/test_model/test_shard_gpt2.py                                          56      1    98%
tests/test_shardformer/test_model/test_shard_llama.py                                         58      1    98%
tests/test_shardformer/test_model/test_shard_opt.py                                           59      1    98%
tests/test_shardformer/test_model/test_shard_t5.py                                            64      1    98%
tests/test_shardformer/test_model/test_shard_vit.py                                           35     20    43%
tests/test_shardformer/test_with_torch_ddp.py                                                 46      2    96%
tests/test_tensor/test_dtensor/test_comm_spec.py                                              78      1    99%
tests/test_tensor/test_dtensor/test_dtensor.py                                                65      5    92%
tests/test_tensor/test_dtensor/test_layout_converter.py                                       91      1    99%
tests/test_tensor/test_shape_consistency.py                                                   50      2    96%
tests/test_tensor/test_sharded_linear.py                                                     130      1    99%
tests/test_tensor/test_sharding_spec.py                                                       13      1    92%
--------------------------------------------------------------------------------------------------------------
TOTAL                                                                                       6677   1045    84%

github-actions · 2023-07-04T08:04:52Z

The code coverage for the changed files is 85%.

Click me to view the complete report

Name                                                                                       Stmts   Miss  Cover
--------------------------------------------------------------------------------------------------------------
colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py                           164     82    50%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/matmul_strategy_generator.py     388     76    80%
colossalai/auto_parallel/tensor_shard/utils/misc.py                                           45      9    80%
colossalai/checkpoint_io/utils.py                                                            243     43    82%
colossalai/device/device_mesh.py                                                             178     14    92%
colossalai/lazy/lazy_init.py                                                                 299     40    87%
colossalai/nn/layer/base_layer.py                                                             36     15    58%
colossalai/nn/layer/parallel_1d/_operation.py                                                 53     26    51%
colossalai/shardformer/__init__.py                                                             1      0   100%
colossalai/shardformer/_utils.py                                                              42     15    64%
colossalai/shardformer/layer/__init__.py                                                       7      0   100%
colossalai/shardformer/layer/_operation.py                                                   152     51    66%
colossalai/shardformer/layer/dropout.py                                                       35      0   100%
colossalai/shardformer/layer/embedding.py                                                    118      5    96%
colossalai/shardformer/layer/linear.py                                                       156     23    85%
colossalai/shardformer/layer/loss.py                                                          49      8    84%
colossalai/shardformer/layer/normalization.py                                                 50     10    80%
colossalai/shardformer/layer/parallel_module.py                                               76     21    72%
colossalai/shardformer/layer/qkv_fused_linear.py                                             196     29    85%
colossalai/shardformer/layer/utils.py                                                         81     10    88%
colossalai/shardformer/modeling/__init__.py                                                    0      0   100%
colossalai/shardformer/modeling/bloom.py                                                      29      5    83%
colossalai/shardformer/policies/__init__.py                                                    0      0   100%
colossalai/shardformer/policies/autopolicy.py                                                 27      2    93%
colossalai/shardformer/policies/basepolicy.py                                                 50      5    90%
colossalai/shardformer/policies/bert.py                                                      109      2    98%
colossalai/shardformer/policies/bloom.py                                                      60      3    95%
colossalai/shardformer/policies/gpt2.py                                                       68      2    97%
colossalai/shardformer/policies/llama.py                                                      43      2    95%
colossalai/shardformer/policies/opt.py                                                        50      2    96%
colossalai/shardformer/policies/t5.py                                                         74      2    97%
colossalai/shardformer/policies/vit.py                                                        26     26     0%
colossalai/shardformer/shard/__init__.py                                                       4      0   100%
colossalai/shardformer/shard/shard_config.py                                                  21      2    90%
colossalai/shardformer/shard/sharder.py                                                       66      5    92%
colossalai/shardformer/shard/shardformer.py                                                   13      0   100%
colossalai/tensor/comm_spec.py                                                               253     93    63%
colossalai/tensor/d_tensor/__init__.py                                                         4      0   100%
colossalai/tensor/d_tensor/api.py                                                            136     18    87%
colossalai/tensor/d_tensor/comm_spec.py                                                      151     35    77%
colossalai/tensor/d_tensor/layout.py                                                          38      1    97%
colossalai/tensor/d_tensor/layout_converter.py                                               195     12    94%
colossalai/tensor/d_tensor/utils.py                                                           38      7    82%
colossalai/tensor/shape_consistency.py                                                       294    120    59%
colossalai/tensor/sharding_spec.py                                                           139     13    91%
colossalai/testing/__init__.py                                                                 4      0   100%
colossalai/testing/comparison.py                                                              54      9    83%
tests/kit/model_zoo/registry.py                                                               18      0   100%
tests/kit/model_zoo/transformers/__init__.py                                                   7      0   100%
tests/kit/model_zoo/transformers/bert.py                                                      42      0   100%
tests/kit/model_zoo/transformers/bloom.py                                                     34      0   100%
tests/kit/model_zoo/transformers/gpt.py                                                       28      0   100%
tests/kit/model_zoo/transformers/llama.py                                                     26      2    92%
tests/kit/model_zoo/transformers/opt.py                                                       32      0   100%
tests/kit/model_zoo/transformers/t5.py                                                        24      0   100%
tests/test_autochunk/test_autochunk_diffuser/test_autochunk_unet.py                           36     10    72%
tests/test_booster/test_mixed_precision/test_fp16_torch.py                                    30      1    97%
tests/test_booster/test_plugin/test_gemini_plugin.py                                          74     10    86%
tests/test_booster/test_plugin/test_low_level_zero_plugin.py                                  60      6    90%
tests/test_booster/test_plugin/test_torch_ddp_plugin.py                                       78      0   100%
tests/test_booster/test_plugin/test_torch_fsdp_plugin.py                                      43      0   100%
tests/test_checkpoint_io/test_gemini_checkpoint_io.py                                         80      0   100%
tests/test_device/test_device_mesh.py                                                         58     36    38%
tests/test_device/test_init_logical_pg.py                                                     27      1    96%
tests/test_fx/test_tracer/test_hf_model/hf_tracer_utils.py                                    21      2    90%
tests/test_fx/test_tracer/test_hf_model/test_hf_albert.py                                     17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_bert.py                                       15      1    93%
tests/test_fx/test_tracer/test_hf_model/test_hf_diffuser.py                                   50     28    44%
tests/test_fx/test_tracer/test_hf_model/test_hf_gpt.py                                        17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_opt.py                                        15      1    93%
tests/test_fx/test_tracer/test_hf_model/test_hf_t5.py                                         17      1    94%
tests/test_fx/test_tracer/test_torchrec_model/test_deepfm_model.py                            39      3    92%
tests/test_fx/test_tracer/test_torchrec_model/test_dlrm_model.py                              41      4    90%
tests/test_fx/test_tracer/test_torchvision_model/test_torchvision_model.py                    31      1    97%
tests/test_lazy/lazy_init_utils.py                                                            72     14    81%
tests/test_lazy/test_distribute.py                                                            73      3    96%
tests/test_lazy/test_models.py                                                                13      1    92%
tests/test_shardformer/__init__.py                                                             0      0   100%
tests/test_shardformer/test_layer/test_dist_crossentropy.py                                   27      1    96%
tests/test_shardformer/test_layer/test_dropout.py                                             42      1    98%
tests/test_shardformer/test_layer/test_embedding.py                                           30      1    97%
tests/test_shardformer/test_layer/test_layernorm.py                                           27      1    96%
tests/test_shardformer/test_layer/test_linear_1d.py                                           85      1    99%
tests/test_shardformer/test_layer/test_qkv_fused_linear_1d.py                                 73      1    99%
tests/test_shardformer/test_layer/test_vocab_parallel_embedding_1d.py                         32      1    97%
tests/test_shardformer/test_model/__init__.py                                                  0      0   100%
tests/test_shardformer/test_model/_utils.py                                                   21      0   100%
tests/test_shardformer/test_model/test_shard_bert.py                                          56      1    98%
tests/test_shardformer/test_model/test_shard_bloom.py                                         56      1    98%
tests/test_shardformer/test_model/test_shard_gpt2.py                                          56      1    98%
tests/test_shardformer/test_model/test_shard_llama.py                                         58      1    98%
tests/test_shardformer/test_model/test_shard_opt.py                                           59      1    98%
tests/test_shardformer/test_model/test_shard_t5.py                                            64      1    98%
tests/test_shardformer/test_model/test_shard_vit.py                                           35     20    43%
tests/test_shardformer/test_with_torch_ddp.py                                                 46      2    96%
tests/test_tensor/test_dtensor/test_comm_spec.py                                              78      1    99%
tests/test_tensor/test_dtensor/test_dtensor.py                                                65      5    92%
tests/test_tensor/test_dtensor/test_layout_converter.py                                       91      1    99%
tests/test_tensor/test_shape_consistency.py                                                   50      2    96%
tests/test_tensor/test_sharded_linear.py                                                     130      1    99%
tests/test_tensor/test_sharding_spec.py                                                       13      1    92%
--------------------------------------------------------------------------------------------------------------
TOTAL                                                                                       6627   1016    85%

FoolPlayer and others added 30 commits June 26, 2023 10:06

[shardformer] updated readme (#3827)

69d3daa

[shardformer] refactored the user api (#3828)

0470f1b

* [shardformer] refactored the user api * polish code

[shardformer] update readme with modules implement doc (#3834)

051e970

* update readme with modules content * remove img

[shardformer] add Dropout layer support different dropout pattern (#3856

3e840f7

) * add dropout layer, add dropout test * modify seed manager as context manager * add a copy of col_nn.layer * add dist_crossentropy loss; separate module test * polish the code * fix dist crossentropy loss

update README (#3909)

bf9c2fd

[shardformer] add gpt2 policy and modify shard and slicer to support (#…

551fec3

…3883) * add gpt2 policy and modify shard and slicer to support * remove unused code * polish code

[shardformer] Unit test (#3928)

661dc3b

* fix bug in slicer, add slicer unit test * add dropout test * use pid as dropout seed * updata dropout test with local pattern * ad todo

[shardformer] support llama model using shardformer (#3969)

17d1607

adjust layer attr

[shardformer] shardformer support t5 model (#3994)

e849d1b

test t5

[shardformer] fix an error in readme (#3988)

73cacb7

* fix an error in readme * simplify code

[device] support init device mesh from process group (#3990)

45a3110

[shardformer] Refactor shardformer api (#4001)

18396e7

* fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review

[shardformer] integrated linear 1D with dtensor (#3996)

579b617

* [shardformer] integrated linear 1D with dtensor * polish code

integrate with dist layer (#4011)

bdc405e

[shardformer] refactored embedding and dropout to parallel module (#4013

2c366e3

) * [shardformer] refactored embedding and dropout to parallel module * polish code

[shardformer] removed inplace tensor sharding (#4018)

eaa46d7

add vocabembedding layer

60eb380

support bert with new api

90e1a0a

[shardformer] updated doc (#4016)

38ceded

[shardformer] fix bert and gpt downstream with new api (#4024)

c982769

* fix bert downstream with new api * remove comment line

[shardformer] adapted llama to the new API (#4036)

b2c5dd0

[shardformer] supported T5 and its variants (#4045)

8219d96

[shardformer] add gpt2 test and layer class refactor (#4041)

0113097

* add gpt2 test and layer class refactor * add dropout in gpt2 policy

[shardformer] adapted T5 and LLaMa test to use kit (#4049)

ac3aef3

* [shardformer] adapted T5 and LLaMa test to use kit * polish code

[shardformer] refactored the shardformer layer structure (#4053)

e5d4a87

FoolPlayer and others added 18 commits June 26, 2023 10:10

support kit use for bert/gpt test (#4055)

d5d9178

* support kit use for bert test * support kit test for gpt2

[shardformer] support module saving and loading (#4062)

9436f73

* [shardformer] support module saving and loading * polish code

[shardformer] add linearconv1d test (#4067)

8108c35

* add linearconv1d test * add linearconv1d test

[shardformer] supported fused qkv checkpoint (#4073)

a484c71

[shardformer] Add layernorm (#4072)

12801e8

* add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm

[test] fixed tests failed due to dtensor change (#4082)

d88844c

* [test] fixed tests failed due to dtensor change * polish code

[shardformer] refactored layernorm (#4086)

4e0db99

[shardformer] shardformer support opt models (#4091)

a7433a0

* [shardformer] shardformer support opt models * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix

[shardformer] supported bloom model (#4098)

8b0930c

[shardformer] supported fused normalization (#4112)

92e669e

[shardformer] integrate with data parallelism (#4103)

8d3f077

[shardformer] import huggingface implicitly (#4101)

60d2cad

[shardformer] added embedding gradient check (#4124)

26ecfd7

[shardformer] write an shardformer example with bert finetuning (#4126)

b6f4e05

* [shardformer] add benchmark of shardformer * [shardformer] add benchmark of shardformer

[shardformer] refactored some doc and api (#4137)

1b4a901

* [shardformer] refactored some doc and api * polish code

[shardformer] made tensor parallelism configurable (#4144)

f8dcf9d

* [shardformer] made tensor parallelism configurable * polish code

[shardformer] added development protocol for standardization (#4149)

d1db043

ver217 requested changes Jul 4, 2023

View reviewed changes

applications/Chat/coati/trainer/.sft.py.swp Outdated Show resolved Hide resolved

colossalai/shardformer/examples/shardformer_benchmark.py Show resolved Hide resolved

FrankLeeeee mentioned this pull request Jul 4, 2023

[chat] removed cache file #4155

Merged

10 tasks

[chat] removed cache file (#4155)

dd9fe39

ver217 approved these changes Jul 4, 2023

View reviewed changes

FrankLeeeee merged commit f447ca1 into main Jul 4, 2023
6 checks passed

FrankLeeeee deleted the feature/shardformer branch July 4, 2023 08:05

FrankLeeeee added the shardformer label Jul 4, 2023

FrankLeeeee self-assigned this Jul 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[shardformer] merge shardformer to main #4152

[shardformer] merge shardformer to main #4152

FrankLeeeee commented Jul 4, 2023

github-actions bot commented Jul 4, 2023

github-actions bot commented Jul 4, 2023

[shardformer] merge shardformer to main #4152

[shardformer] merge shardformer to main #4152

Conversation

FrankLeeeee commented Jul 4, 2023

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

github-actions bot commented Jul 4, 2023

github-actions bot commented Jul 4, 2023